Classi cation of Scienti c Papers Using Machine Learning Minh
نویسنده
چکیده
The project aims to develop a domain-independent and adaptive approach for scienti c document classi cation using both information from document contents and citation links. We evaluate several content-based classi cation methods including K-nearest neighbours, nearest centroid, naive Bayes and decision trees and nd that the naive Bayes outperform other when training set is sufciently large. Using phrases in addition to words and a good feature selection strategy such as information gain is found to improve system accuracy in comparison with using words only. To combine citation links for classi cation, the project proposes two methods, linear labelling update and probabilistic labelling update. The two methods iteratively update the labellings of classi ed documents using categories information from neighbouring documents. Our experiments on the two methods show that, combining contents and citations signi cantly improves the system performance.
منابع مشابه
A classification of abduction: abduction for logic programming
Abduction is a methodology of scienti c researches. Peirce showed three types of abduction, and expressed them by one syllogism. Recently various researches on abduction or abductive logic have been developed in the elds of automated reasoning and machine learning. In order to systematically understand such researches and to clearly discuss abduction, this paper classi es abduction into ve type...
متن کاملThe Case against Accuracy Estimation for Comparing Induction Algorithms
We analyze critically the use of classi cation accuracy to compare classi ers on natural data sets, providing a thorough investigation using ROC analysis, standard machine learning algorithms, and standard benchmark data sets. The results raise serious concerns about the use of accuracy for comparing classi ers and draw into question the conclusions that can be drawn from such studies. In the c...
متن کاملA Hybrid Approach for Sentiment Analysis Applied to Paper Reviews
is article discusses the problem of extracting sentiment and opinions about a collection of articles on scientic reviews conducted under an international conference on computing in Spanish language. e aim of this analysis is on the one hand to automatically determine the orientation of a review of an article and contrast this approach with the assessment made by the reviewer of the article. ...
متن کاملA Machine Learning Approach to Building Domain-Speci c Search Engines
Domain-speci c search engines are becoming increasingly popular because they o er increased accuracy and extra features not possible with general, Web-wide search engines. Unfortunately, they are also di cult and timeconsuming to maintain. This paper proposes the use of machine learning techniques to greatly automate the creation and maintenance of domain-speci c search engines. We describe new...
متن کاملModeling Classi cation and Inference Learning
Human categorization research is dominated by work in classi cation learning. The eld may be in danger of equating the classi cation learning paradigm with the more general phenomenon of category learning. This paper compares classi cation and inference learning and nds that different patterns of behavior emerge depending on which learning mode is engaged. Inference learning tends to focus subj...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005